Lexical Semantics Annotation for Enriched Portuguese Corpora

نویسندگان

  • Steven Neale
  • Rita Valadas Pereira
  • João Ricardo Silva
  • António Branco
چکیده

The semantic annotation of corpora has an important role to play in ensuring that sentences occurring in natural language texts are correctly understood based on their intended context. Two examples of lexical semantic units that contribute to this knowledge are word senses – which allow words with multiple meanings to be understood based on the context in which they are used – and named entities – which can be disambiguated and linked back to the specific encyclopedic resources that describe them. In this paper, we describe the construction of lexical semanticallyannotated corpora for Portuguese, annotated with both word senses linked to senses in a Portuguese wordnet and named entities linked to Portuguese Wikipedia entries using DBpedia. The result is a goldstandard lexical semantically-annotated resource that is useful in supporting the training and evaluation of tools for the disambiguation of these lexical units in Portuguese.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Hinoki Sensebank - A Large-Scale Word Sense Tagged Corpus Of Japanese

While there has been considerable research on both structural annotation (such as the Penn Treebank (Taylor et al., 2003) or the Kyoto Corpus (Kurohashi and Nagao, 2003)) and semantic annotation (e.g. Senseval: Kilgariff and Rosenzweig, 2000; Shirai, 2002), there are almost no corpora that combine both. This makes it difficult to carry out research on the interaction between syntax and semantic...

متن کامل

VerbLexPor: a lexical resource with semantic roles for Portuguese

This paper presents a lexical resource developed for Portuguese. The resource contains sentences annotated with semantic roles. The sentences were extracted from two domains: Cardiology research papers and newspaper articles. Both corpora were analyzed with the PALAVRAS parser and subsequently processed with a subcategorization frames extractor, so that each sentence that contained at least one...

متن کامل

The Limits of Using FrameNet Frames to Build a Legal Ontology

FrameNet frames have been used to develop lexical databases and annotated corpora for different languages. This paper analyses the use of FrameNet frames to build a legal ontology for the Brazilian Law. In order to discuss the problems of such approach to ontology development, the lexical units evoking the Criminal_process frame were contrasted in English and Portuguese. Frame divergence betwee...

متن کامل

Corpus-based Induction of a Frame Semantics Projection for LFG

In computational linguistics there is growing insight that high-quality NLP applications for information access (question anwering, etc.) are in need of deeper linguistic analysis, in particular, semantic analysis. A bottleneck for semantic processing is the lack of large-scale domain-independent lexical semantic resources. While WordNets for several languages are important lexical resources fo...

متن کامل

Retrieving Lexical Semantics from Multilingual Corpora

This paper presents a technique to build a lexical resource used for annotation of parallel corpora where the tags can be seen as multilingual ‘synsets’. The approach can be extended to add relationships between these synsets that are akin to WordNet relationships of synonymy and hypernymy. The paper also discusses how the success of this approach can be measured. The reported results are for E...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016